Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

نویسندگان

Fanhua Shang

Yuanyuan Liu

James Cheng

Jiacheng Zhuo

چکیده

Recently, research on accelerated stochastic gradient descentmethods (e.g., SVRG) has made exciting progress (e.g., lin-ear convergence for strongly convex problems). However,the best-known methods (e.g., Katyusha) requires at leasttwo auxiliary variables and two momentum parameters. Inthis paper, we propose a fast stochastic variance reductiongradient (FSVRG) method, in which we design a novel up-date rule with the Nesterov’s momentum and incorporatethe technique of growing epoch size. FSVRG has only oneauxiliary variable and one momentum weight, and thus itis much simpler and has much lower per-iteration complex-ity. We prove that FSVRG achieves linear convergence forstrongly convex problems and the optimal O(1/T ) conver-gence rate for non-strongly convex problems, where T is thenumber of outer-iterations. We also extend FSVRG to di-rectly solve the problems with non-smooth component func-tions, such as SVM. Finally, we empirically study the per-formance of FSVRG for solving various machine learningproblems such as logistic regression, ridge regression, Lassoand SVM. Our results show that FSVRG outperforms thestate-of-the-art stochastic methods, including Katyusha. KeywordsStochastic optimization, variance reduction, momentum ac-celeration, non-strongly convex, non-smooth

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Proximal Gradient Descent with Acceleration Techniques

Proximal gradient descent (PGD) and stochastic proximal gradient descent (SPGD) are popular methods for solving regularized risk minimization problems in machine learning and statistics. In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting. This method incorporates two acceleration techniques: one is Nesterov’s acceleration method, and the othe...

متن کامل

Finite Sum Acceleration vs. Adaptive Learning Rates for the Training of Kernel Machines on a Budget

Training predictive models with stochastic gradient descent is widespread practice in machine learning. Recent advances improve on the basic technique in two ways: adaptive learning rates are widely used for deep learning, while acceleration techniques like stochastic average and variance reduced gradient descent can achieve a linear convergence rate. We investigate the utility of both types of...

متن کامل

Stochastic Variance-Reduced ADMM

The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning. Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAGADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities. However, their space requirements can still be high. In this paper, we propose an inte...

متن کامل

Fast-and-Light Stochastic ADMM

The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning. Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities. However, their space requirements can still be high. In this paper, we propose an int...

متن کامل

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent (SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1703.07948 شماره

صفحات -

تاریخ انتشار 2017

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

نویسندگان

چکیده

منابع مشابه

Stochastic Proximal Gradient Descent with Acceleration Techniques

Finite Sum Acceleration vs. Adaptive Learning Rates for the Training of Kernel Machines on a Budget

Stochastic Variance-Reduced ADMM

Fast-and-Light Stochastic ADMM

Fast Asynchronous Parallel Stochastic Gradient Decent

عنوان ژورنال:

اشتراک گذاری